Toward a Totally Unsupervised, Language-Independent Method for the Syllabification of Written Texts
نویسنده
چکیده
Unsupervised algorithms for the induction of linguistic knowledge should at best require as few basic assumptions as possible and at the same time in principle yield good results for any language. However, most of the time such algorithms are only tested on a few (closely related) languages. In this paper, an approach is presented that takes into account typological knowledge in order to induce syllabic divisions in a fully automatic manner based on reasonably-sized written texts. Our approach is able to account for syllable structures of languages where other approaches would fail, thereby raising the question whether computational methods can really be claimed to be language-universal when they are not tested on the variety of structures that are found in the languages of the world.
منابع مشابه
Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
There are several native languages in Peru which are mostly agglutinative. These languages are transmitted from generation to generation mainly in oral form, causing different forms of writing across different communities. For this reason, there are recent efforts to standardize the spelling in the written texts, and it would be beneficial to support these tasks with an automatic tool such as a...
متن کاملMove-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts
Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...
متن کاملHeuristic Syllabification and Statistical Syllable-Based Modeling for Speech-Input Topic Identification
We describe a heuristic syllabification method and the use of a statistical syllable n-gram language model for discriminating between a closed set of topics. The syllabification method works by assigning costs to consonant clusters and then splitting the clusters where the cost is minimized. We apply the syllabification on a pronunciation dictionary which maps words to phone sequences; the resu...
متن کاملAutomatic Syllabification for Manipuri language
Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project ...
متن کاملUnsupervised Learning of a Chinese Spontaneous and Colloquial Speech Lexicon with Content and Filler Phrase Classification
There is significant lexical difference—words and usage of words-between spontaneous/colloquial language and the written language. This difference affects the performance of spoken language recognition systems that use statistical language models or context-free-grammars because these models are based on the written language rather than the spoken form. There are many filler phrases and colloqu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010